Distance phenomena in high-dimensional chemical descriptor spaces: Consequences for similarity-based approaches
نویسندگان
چکیده
منابع مشابه
Distance phenomena in high-dimensional chemical descriptor spaces: Consequences for similarity-based approaches
Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendati...
متن کاملSimilarity Search in High-Dimensional Data Spaces
This paper summarizes analytical and experimental results for the nearest neighbor similarity search problem in high-dimensional vector spaces using some kind of space-or data-partitioning scheme. Under the assumptions of uniformity and independence of data, we are able to formally show and to demonstrate that conventional approaches to the nearest neighbor problem degenerate if the dimensional...
متن کاملUsing the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces
We investigate the problem of approximate similarity (nearest neighbor) search in high-dimensional metric spaces, and describe how the distance distribution of the query object can be exploited so as to provide probabilistic guarantees on the quality of the result. This leads to a new paradigm for similarity search, called PAC-NN (probably approximately correct nearest neighbor) queries, aiming...
متن کاملClindex: Clustering for Similarity Queries in High-Dimensional Spaces
In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to nd many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can nd near points with high recall in very few IOs and performs signi cantly better...
متن کاملClustering for Approximate Similarity Search in High-Dimensional Spaces
In this paper we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one wishes to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform sign...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computational Chemistry
سال: 2009
ISSN: 0192-8651,1096-987X
DOI: 10.1002/jcc.21218